Capturing Term Dependencies using a Sentence Tree based Language Model
نویسندگان
چکیده
We describe a new probabilistic Sentence Tree Language Modeling approach that captures term dependency patterns in Topic Detection and Tracking’s (TDT) Story Link Detection task. New features of the approach include modeling the syntactic structure of sentences in documents by a sentence-bin approach and a computationally efficient algorithm for capturing the most significant sentence level term dependencies using a Maximum Spanning Tree approach, similar to Van Rijsbergen’s modeling of document-level term dependencies. The new model is a good discriminator of on-topic and off-topic story pairs providing evidence that sentence level term dependencies contain significant information about relevance. Although runs on a subset of the TDT2 corpus show that the model is outperformed by the unigram language model, a mixture of the unigram and the Sentence Tree models is shown to improve on the best performance especially in the regions of low false alarms.
منابع مشابه
Larger-Context Language Modelling
In this work, we propose a novel method to incorporate corpus-level discourse information into language modelling. We call this larger-context language model. We introduce a late fusion approach to a recurrent language model based on long short-term memory units (LSTM), which helps the LSTM unit keep intra-sentence dependencies and inter-sentence dependencies separate from each other. Through t...
متن کاملUsing a Supertagged Dependency Language Model to Select a Good Translation in System Combination
We present a novel, structured language model Supertagged Dependency Language Model to model the syntactic dependencies between words. The goal is to identify ungrammatical hypotheses from a set of candidate translations in a MT system combination framework and help select the best translation candidates using a variety of sentence-level features. We use a two-step mechanism based on constituen...
متن کاملA Study on Effectiveness of Syntactic Relationship in Dependence Retrieval Model
To relax the Term Independence Assumption, Term Dependency is introduced and it has improved retrieval precision dramatically. There are two kinds of term dependencies, one is defined by term proximity, and the other is defined by linguistic dependencies. In this paper, we take a comparative study to re-examine these two kinds of term dependencies in dependence language model framework. Syntact...
متن کاملStudying impressive parameters on the performance of Persian probabilistic context free grammar parser
In linguistics, a tree bank is a parsed text corpus that annotates syntactic or semantic sentence structure. The exploitation of tree bank data has been important ever since the first large-scale tree bank, The Penn Treebank, was published. However, although originating in computational linguistics, the value of tree bank is becoming more widely appreciated in linguistics research as a whole. F...
متن کاملLarger-Context Language Modelling with Recurrent Neural Network
In this work, we propose a novel method to incorporate corpus-level discourse information into language modelling. We call this larger-context language model. We introduce a late fusion approach to a recurrent language model based on long shortterm memory units (LSTM), which helps the LSTM unit keep intra-sentence dependencies and inter-sentence dependencies separate from each other. Through th...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2002